Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding new attackmodules #94

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Mungusbean
Copy link

Description

new file: attack-modules/homoglyph_v2_attack.py
new file: attack-modules/payload_mask_attack.py

Homoglyphv2 attackmodule: Modfied orginal homoglyph attack now randomly and sequentially increase the amount of replaced letters with a homoglyph.

Payload Mask: To get the LLM to echo back a prompt that would normally be caught by the prompt filter or embeddings.

Motivation and Context

Contribution of potential attackmodules for the moonshot project

Type of Change

other: Add on to existing attackmodules.

Checklist

Please check all the boxes that apply to this pull request using "x":

  • I have tested the changes locally and verified that they work as expected.
  • I have added or updated the necessary documentation (README, API docs, etc.).
  • I have added appropriate unit tests or functional tests for the changes made.
  • I have followed the project's coding conventions and style guidelines.
  • I have rebased my branch onto the latest commit of the main branch.
  • I have squashed or reorganized my commits into logical units.
  • I have added any necessary dependencies or packages to the project's build configuration.
  • I have performed a self-review of my own code.
  • I have read, understood and agree to the Developer Certificate of Origin below, which this project utilises.
Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
   have the right to submit it under the open source license
   indicated in the file; or

(b) The contribution is based upon previous work that, to the best
   of my knowledge, is covered under an appropriate open source
   license and I have the right under that license to submit that
   work with modifications, whether created in whole or in part
   by me, under the same open source license (unless I am
   permitted to submit under a different license), as indicated
   in the file; or

(c) The contribution was provided directly to me by some other
   person who certified (a), (b) or (c) and I have not modified
   it.

(d) I understand and agree that this project and the contribution
   are public and that a record of the contribution (including all
   personal information I submit with it, including my sign-off) is
   maintained indefinitely and may be redistributed consistent with
   this project or the open source license(s) involved.

	new file:   attack-modules/homoglyph_v2_attack.py
	new file:   attack-modules/payload_mask_attack.py
@Mungusbean Mungusbean changed the title Changes to be committed: Adding new attackmodules Sep 10, 2024
@imda-kelvinkok
Copy link
Collaborator

imda-kelvinkok commented Sep 24, 2024

@Mungusbean

hello. thanks for your commit and sorry it took a while to get back as we have been swamped lately.

we're currently trying out your attack modules and i noticed that the percentage of characters replaced seems to be appended to the prompt that is sent to the LLM. is this the expected behaviour? (for homoglyph attack)

image

@imda-kelvinkok
Copy link
Collaborator

imda-kelvinkok commented Sep 27, 2024

@Mungusbean thanks for the prompt updates.

tested the attack modules after the update and both attack modules are working. this will be merged after our QA has verified.

it should be scheduled for our next release, which is in about 2 weeks time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants